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C/^ ■ Abstract 

We construct a class of real- valued nonnegative binary functions on a 
set of jointly distributed random variables, which satisfy the triangle in- 
equality and vanish at identical arguments (pseudo-quasi-metrics). These 
functions are useful in dealing with the problem of selective probabilistic 
causality encountered in behavioral sciences and in quantum physics. The 
problem reduces to that of ascertaining the existence of a joint distribu- 
tion for a set of variables with known distributions of certain subsets of 
this set. Any violation of the triangle inequality or its consequences by 
one of our functions when applied to such a set rules out the existence of 
this joint distribution. We focus on an especially versatile and widely ap- 
plicable pseudo-quasi-metric called an order-distance and its special case 
called a classification distance. 
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We show how certain metric-like functions on jointly distributed random 
variables (pseudo-quasi-metrics introduced in Section [1]) can be used in deal- 
ing with the problem of selective probabilistic causality (introduced in Section 
[5]), illustrating this on examples taken from behavioral sciences and quantum 
physics (Section 131). Although most of Section [5] applies to arbitrary pseudo- 
quasi-metrics on jointly distributed random variables, we single out one, termed 
order- distance, which is especially useful due to its versatility. We discuss ex- 
amples of other pseudo-quasi-metrics and rules for their construction in Section 

HI 



1 Order p.q.-metrics 

Random variables in this paper are understood in the broadest sense, as mea- 
surable functions X : Vs ^ V, no restrictions being imposed on the sample 
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spaces {Vs,T,s, Us) and the induced probability spaces, {V,T,,fx), with the usual 
meaning of the terms (sets of values Vs,V, sigma-algebras Ss,S, and proba- 
bility measures /Us,/Li). In particular, any set X of jointly distributed random 
variables (functions on the same sample space) is a random variable, and its 
induced probability space (or, simply, distribution) X = {V, is referred to 
as the joint distribution of its elements. 

Given a class of random variables not necessarily jointly distributed, let 
^* be the class of distributions X for all X G For any class function /* : 
^* ^ K (reals), the function f : ^ ^ R defined by / (X) = f* (X) is called 
observable (as it does not depend on sample spaces, typically unobservable) . 
We will conveniently confuse / and /* for observable functions, so that if / is 
defined on then / (V), identified with /* (Y), is also defined for any Y ^ 3^ 
with Y E (This convention is used in Section [21 when we apply a function 
defined on a set of random variables H to different but identically distributed 
sets of A- variables.) 

For an arbitrary nonempty set let H = {H^^ : u E fl} he a indexed set of 
jointly distributed random variables with distributions H^j = (Kj, S^;, /x,^). 
For any a, /? S O, the ordered pair {Ha, Hp) is a random variable with distribu- 
tion {Va X Vp, X S^, fj.a.f3), and H x H is a, set of jointly distributed random 
variables (hence also a random variable). 

Definition 1.1. We call an observable function d : H x H ^ M. a. pseudo-quasi- 
metric {p. q. -metric) on H if, for all a, 13,^ G il, 

(i) diHa,Hp)>0, 

(ii) d(iJ„,i/„) =0, 

(iii) d{Ha,H^)<d{Ha,Hti)+d{Hp,H^). 

For terminological clarity, the conventional pseudometrics (also called semi- 
metrics) obtain by adding the property d{Ha, Hp) — d (Hp, Ha)', the conven- 
tional quasimetrics are obtained by adding the property a ^ (3 ^ d {Ha, Hp) > 
0. A conventional metric is both a pseudometric and a quasimetric. (See, e.g., 
Zolotarev, 1976, for discussion of a variety of metrics and pseudometrics on 
random variables.) 

By obvious argument we can generalize the triangle inequality, (iii) : for any 
Hai, ■ ■ ■ ,Hai E H {I > 3), 

I 

d{Ha„Ha,)<Y,d{Ha^_„Ha;). (1) 

i=2 

We refer to this inequality (which plays a central role in this paper) as the chain 
inequality. 
Let 

RC y VaXVp, 

(Q,/3)Gnxn 

and we write a ^ 6 to designate (a, b) G R. Let i? be a total order, that is, tran- 
sitive, reflexive, and connected in the sense that for any (a, b) G IJ^^ p)£nxn ^ 
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Va, at least one of the relations a ^ b and b ^ a holds. We define the equivalence 
a ~ 6 and strict order a ^ b induced by ^ in the usual way. Finally, we assume 
that for any (a, (3) G SI x fl, the sets 

{{a,b) : a e e V3,a ^ 6} 

are /ia.^a-measurable. This implies the /^a./S-measurability of the sets 

{(a, b) : a € Va,b G V0,a ~< b} , {{a, b) : a € Va,b € V0,a ^ b} . 

Thus, if all are intervals of reals, < can be chosen to coincide with <, and 
(assuming the usual Borel sigma algebra) all the properties above are satisfied. 
Another example: for arbitrary V^., provided each E^j contains at least n > 1 
disjoint nonempty sets, one can partition as Ufe=i K;'^^ with vS''^ € T,^, and 

(k) (I) 

put a ^ 6 if and only if a Q Va , 6 € and k <l. Again, all properties above 

are clearly satisfied. 

Definition 1.2. The function 



is called an order p. q. -metric, or order- distance, on H. 

That the definition is well-constructed follows from 
Theorem 1.3. Order- distance D is a p.q.-metric on H. 

Proof. Let a,/3,7 G VL, and = A, Hp = B, and = X. That T){A,B) 
is determined by the distribution of {A, B) is obvious from the definition. The 
properties D {A, B)>Q and D {A, A) = are obvious too. To prove the triangle 



inequality, 

D {A, B) = Vi[A < B]=Vi[A < B -< X]+Vi[A ~< B <^ X] 
+ Vi[A~<X -<B]+Vi[Ar^ X ~<B]+Vi[X -<A<B], 

D {A, X) = Vi[A < X]=Vi[A < X -< B]+'Pv[A < B X] 
+ Pr [A < B < X]+Vi[A B -< X] -|-Pr[S -<A<X], 

D {X, B)=Vi[X <B]=I>v[X ~<B ~<A]+I'v[X ~<Ar^ B] 



D {A, X) + D {X, B)-T){A,B)=Vt:[B <A-<X]+Vr[Ar~. B < X] 




+ ^v[X < A < B] -\-^v[A r-. X < B] + ^v[A < X < B]. 



So 



+ Vt:[X <B <A]+Vt:[X <Ar. B]+Vv[A<X <B]>{). 



□ 
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Since in the last expression all events are pairwise exclusive, we have 

D {A, X) + D {X, B) - D {A, B) < 1. 

This may seem an attractive addition to the triangle inequality. The inequality 
is redundant, however, as it is subsumed by the triangle inequalities holding on 
{A,B,X}. Rewriting the expression above as 

D(A,B) + 1-D(X,B)-D(A,X) >0, 

it immediately follows from 

T>{A,B)+T>{B,X)-T){A,X) >0 

and 

D (B, X) = Pr [B ^ X] < 1 - Pr [X ^ B] = 1 - D (X, B) . 

2 Selective probabilistic causality 

Consider an indexed set W — {W^ : A € A}, with each being a set referred 
to as a (deterministic) input, with the elements of {A} x called input points. 
Input points therefore are pairs of the form x = (A, w) and should not be 
confused with input values w. A nonempty set $ C Haga called a set of 

(allowable) treatments; a treatment therefore is also a set of pairs of the form 
{\w). 

Let there be a collection of sets of random variables, referred to as (random) 
outputs, 

= {A^ : A G A} , e 

such that the distribution of (i.e., the joint distribution of all A'^ in A^) is 
known for every treatment (/>. We define 

= {^^ : e , A e A, 

with the understanding that A^ is not a random variable (i.e., A'^ for different 
(f) are not jointly distributed). 

The following problem is encountered in a wide variety of contexts (see Dzha- 
farov, 2003; Dzhafarov & Gluhovsky, 2006; Kujala & Dzhafarov, 2008). We say 
that the dependence of random outputs A^ on the deterministic inputs is 
(canonically) selective if, for every A G A and every the output is 

"influenced" by none of the input points in (j) except, possibly, for the one be- 
longing to {A} X . The question is how one should define this selectivity 
of "influences" rigorously, and how one can determine whether this selectivity 
holds. This problem was introduced to behavioral sciences in Sternberg (1969) 
and Townsend (1984). In quantum physics, using different terminology, it was 
introduced in Bell (1964) and elaborated in Fine (1982a-b). The definition can 
be given in several equivalent forms, of which we present the one focal for the 
present context. 
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Definition 2.1. The dependence of {A^ : A G A} on {W^ : A G A} (or the 
"influence" of the latter on the former) is (canonically) selective if there is a set 
of jointly distributed random variables 

(one random variable for every value of every input), such that, for every ^ G 

= A^, 

where 
and 

= {A^ : A G A} 

(the corresponding elements of and A^ being those sharing the same A). 

This definition is known as the Joint Distribution Criterion (JDC) for se- 
lectivity of influences, and the set H satisfying this definition is referred to as a 
(hypothetical) JDC-set. Specialized forms of this criterion in quantum physics 
can be found in Suppes & Zanotti (1981) and Fine (1982a-b); in the behav- 
ioral context and in complete generality this criterion is given (derived from an 
equivalent definition) in Dzhafarov & Kujala (2010). 

Remark 2.2. The adjective "canonical" in the definition refers to the one-to-one 
correspondence between and A^ sharing the same A. A seemingly more gen- 
eral scheme, in which different A^ are selectively influenced by different (possibly 
overlapping) subsets of {W^ : A G A} is always reducible to the canonical form 
by considering, for every A^, the Cartesian product of the inputs influencing it 
a single input, and redefining correspondingly the sets of input points and the 
set of allowable treatments. 

The simplest consequence of JDC is that the selectivity of influences implies 
marginal selectivity (Dzhafarov, 2003; Townsend & Schwoickort, 1989), defined 
as follows. For any A' C A we can uniquely present any G $ as 0' U 0', where 
0' e DasA' and W e HagA-A' Then, if JDC is satisfied, the joint 
distribution of : A G A'| does not depend on (/)'. 

Remark 2.3. In the following we always assume that marginal selectivity is 
satisfied. 

The relevance of the order-distance and other p. q. -metrics on the sets of 
jointly distributed random variables to the problem of selectivity lies in the 
general test (necessary condition) for selectivity of influences, formulated after 
the following definition. 

Definition 2.4. We call a sequence of input points 

xi = {ai,wi) ,...,xi = {ai,wi) 
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(where Wi G W^"' for i = 1, ...,/> 3) treatment-realizable if there are treatments 
...,(/)' G $ (not necessarily pairwise distinct), such that 

{xi,xi} C fli^ and {xi-\,Xi} C for i = 2, . . . , L 

If a JDC-set H exists, then for any p.q.-metric d on H we should have 

d{H:i,H^l)=d[A;i,A;i) 

d(H^Zl,H:;^=d(jg-\Afj 

for i = 2, . . . , Z whence 

This chain inequality, written entirely in terms of observable probabilities, is 

referred to as a p.q.-metric test for selectivity of influences. If this inequal- 
ity is violated for at least one treatment-realizable sequence of input points, 
no JDC-set H exists, and the selectivity is ruled out. Note: if the sequence 
. . . , G $ for a given .xi, . . . , x; can be chosen in more than one way, 

the observable quantities d , A^j^)^ and d (^A'^l'^^^ , A'^i^^^ remain invari- 

ant due to the (tacitly assumed) marginal selectivity. 

As an example, let A = {1,2}, = [0,1], = [0,1], ^ = x W^. 

For any cj) = {(1, v) , (2, w)} = {v, w), let |^^, have a bivariate normal dis- 
tribution with zero means, unit variances, and correlation p = min (1. ?; + ?i)). 
Marginal selectivity is trivially satisfied. Do {W^^, W^^} influence {A^,^!-^} se- 
lectively? For any bivariate normally distributed {A,B), let us define A -< B 
iff ^ < 0, i? > 0. Then the corresponding order-distance on the hypothetical 
JDC-set H is 

D {H^ ) = a^^ccos (min(l,^;-m;)) 

The sequence of input points (1, 0) , (2, 1) , (1, 1) , (2, 0) is treatment-realizable, 
so if H exists, we should have 

D {HlHl) < D {Hi Hi) + D {Hj, Hi) + D {hI,HI) . 

The numerical substitutions yield, however, 

^ < 0-hO-hO, 
4 

and as this is false, the hypothesis that {W^, W^^} influence {A^, selectively 

is rejected. 

The theorem below and its corollary show that one only needs to check the 
chain inequality for a special subset of all possible treatment-realizable sequences 
Xl, ...,xi. 
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Definition 2.5. A treatment-realizable sequence xi, . . . ,xi is called irreducible 
if xi xi and the only subsequences {xi-^ , • • ■ , Xi^. } with k > 1 that are subsets 
of treatments are pairs {xi,xi} and {xi-i,Xi}, for i = 2, . . . ,1. Otherwise the 
sequence is reducible. 

Theorem 2.6. Given a p. q. -metric d on the hypothetical JDC-set H , inequality 
(0j is satisfied for all treatment-realizable sequences if and only if this inequality 
holds for all irreducible sequences. 

Proof. We prove this theorem by showing that if © is violated for some re- 
ducible sequence xi , . . . , a;; , then it is violated for some proper subsequence 
thereof. Clearly, xi ^ xi because otherwise ([2]) is not violated. For I = 3, 
xi,X2,X3 is reducible only if it is contained in a treatment: but then ^ would 
be satisfied. So Z > 3, and the reducibility oi xi, . . . , xi means that there is a 
pair {xp,Xq} belonging to a treatment, with (p, g) ^ (1,0 ^^'^ q > p -\- 1. But 
then must be violated for either Xp, . . . ,Xq or Xi, . . . , Xp, Xq, . . . ,xi (allowing 
ior p ~ 1 or q = I but not both). □ 

If (f> = Hag A (^-'^1 logically possible treatments are allowable), then any 
subsequence Xi-^ , • ■ • , of input points with pairwise distinct a^^ , . . . , at^, be- 
longs to some treatment. Therefore an irreducible sequence cannot contain 
points of more than two inputs, and it is easy to see that then it must be 
a sequence of pairwise distinct xi G {a} x W°',X2 & {/3} x , ...,X2m-i S 
{a} X W"-,X2m G {/3} X {a ^ /3). It is also easy to see that if m > 2, each 
of the subsets {xi, X4} and {x2^ 2:5} will belong to a treatment. Hence to = 2 is 
the only possibility for an irreducible sequence. 

Corollary 2.7. //$ — HagA , then inequality (0^ is satisfied for all treatment- 
realizable sequences if and only if this inequality holds for all tetradic sequences 
of the form x,y,s,t, with x,s € {a} x W", y,t G {/3} x W^, x ^ s, y ^ t, 
p. 

Remark 2.8. This formulation is given in Dzhafarov and Kujala (2010), although 
there it is unnecessarily confined to metrics of a special kind. 

3 An application 

The four tables below represent results of an experiment with a 2 x 2 factorial 
design, {x,x'} x {y,y'}, and two binary responses, A and B. In relation to our 
general notation, we have here A = {1,2}, — {a;,x'}, W'^ = {y,?/'}, and 
four treatments (x,?/) , . . . , {x',y')] for every treatment 0, the random outputs 
and are represented by, respectively, A^ and -B^ , each having two possible 
values, arbitrarily labeled. This design is arguably the simplest possible, and 
it is ubiquitous in science. In a psychological double-detection experiment (see, 
e.g., Townsend & Nozawa, 1995), the input values may represent presence {x 
and y) or absence {x' and y') of a designated signal in two stimuli labeled 1 and 
2, presented side-by-side. The participant in such an experiment is asked to 
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indicate whether the signal was present or absent in stimulus 1 and in stimulus 
2. The output values A = o and B = Fl may indicate either that the response 
was "signal present" or that the response was correct; and analogously for A = 
• and B = U (cither "signal absent" or an incorrect response). The entries 
Pijiliji etc. represent joint probabilities of the corresponding outcomes, aj.,a^., 
etc. represent marginal probabilities. The question to be answered is: does 
the response to a given stimulus [A to 1 and B to 2) selectively depend on 
that stimulus alone (despite A and B being stochastically dependent for every 
treatment), or is ^ or B influenced by both 1 and 2? 



(j) = {x,y) 


Bxy — 


u 


Bxy — 


n 




4> = {x,y') 


Bxy' 


u 


Bxy' — 


n 




-^xy — • 


Pll 


Pl2 




-^xy' ~ * 


«11 


qi2 


ai. 


-^xy ~ ^ 


P21 


P22 


a2. 


-^xy' — ^ 


521 


922 


a2. 




b.i 




6.2 








y.l 




b'.2 






(f>= {x',y) 




u 


^x'y 


n 




^ = ix',y') 


Bx'y' = 


u 


Bx'y' = 


n 




-^x'y — * 




ri2 


a[. 


4 / / — • 

-^x y 


Sll 


S12 


a'l. 


■^x'y ~ ^ 


r2i 


T22 


a'2. 


Ax'y' = 


S21 


S22 


a'2. 




b.i 




6.2 












b'.2 







Another important situation in which we encounter formally the same prob- 
lem is the Einstcin-Podolsky-Rosen (EPR) paradigm. Two particles are emitted 
from a common source in such a way that they remain entangled (have highly 
correlated properties, such as momenta or spins) as they run away from each 
other (Aspect, 1999: Mcrmin, 1985). An experiment may consist, e.g., in mea- 
suring the spin of electron 1 along one of two axes, x or x' , and (in another 
location but simultaneously in some inertial frame of reference) measuring the 
spin of electron 2 along one of two axes, y or y' . The outcome A of a measure- 
ment on electron 1 is a random variable with two possible values, "up" or "down," 
and the same holds for B, the outcome of a measurement on electron 2. The 
question here is: do the measurements on electrons 1 and 2 selectively affect, 
respectively, A and B (even though generally A and B are not independent at 
any of the four combinations of spin axes)? If the answer is negative, then the 
measurement of one electron affects the outcome of the measurement of another 
electron even though no signal can be exchanged between two distant events 
that are simultaneous in some frame of reference. What makes this situation 
formally identical to the double-detection example described above is that the 
measurements performed along different axes on the same particle, x and x' or y 
and y', are non-commuting, i.e., they cannot be performed simultaneously. This 
makes it possible to consider such measurements as mutually exclusive values 
of an input. 
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Theorem 3.1. [Fine, 1982a-h] A JDC-set H = {Hl,Hl,,H^,H^,} satisfying 

{HxtH!^] = {A^y, B.j;y}, ^H^^Hy,^ ~ {A^y' ,B^y'}, 

{Hl„Hl}^{A,,y,B,,y}, [Hl„Hl,]^{A^,y,,B^.y,} 



exists if and only if the following eight inequalities 


are satisfied: 


-1 < Pii + rii + sii - - a[. 


- b.i < 0, 


-1 < qii + sii + rii - pii - a'l. 


- 6a < 0, 


-1 < Til + pii + qii - sii - ai. 


- 6.1 < 0, 


-1 < sii + qii +pii - rii - ai. 


- b\ < 0. 



(3) 



We refer to ^ as Bell-CHSH-Fine inequalities, where CHSH abbreviates 
Clauser, Horne, Shimony, & Holt (1969): in this work Beh's (1964) approach 
was developed into a special version of ([3]). 

Remark 3.2. The proof given in Fine (1982a-b) that ([3]) is both necessary and 
sufficient (under marginal selectivity) for the existence of a JDC-set can be 
conceptually simplified: the Bell-CHSH-Fine inequalities can be algebraically 
shown to be the criterion for the existence of a vector Q with 16 probabilities 

Pr [Hi = .,Hl, ^..Hl^U.Hl^u],..., 

Pr [Hi = o, Hi, - o, Hi = n,Hl = n] 

that sum to one and whose appropriately chosen partial sums yield the 8 ob- 
servable probabilities 

Pii, sii, ai., 6.1, a'l., 6'i 

(other probabilities being determined due to marginal selectivity). This is a 
simple linear programming task, and the Bell-CHSH-Fine inequalities can be 
derived "mechanically" by a facet enumeration algorithm (see Werner & Wolf, 
2001a-b, and Basoalto & Percival, 2003). 

The point of interest in the present context is that the Bell-CHSH-Fine 
inequalities, whose rather obscure structure does not seem to fit their funda- 
mental importance, turn out to be interpretable as the triangle inequalities for 
appropriately chosen order-distances. 

Consider the chain inequalities for the order-distance Di obtained by putting 
• = U = l,o = n = 2, and identifying ^ with <: 

912 - B,{Hl,H^,) < BiiHl,H^) + B,iHlHl,)+BiiHl„H^y,) ^ p,2 + r2i + si2, 
P12 - B.iHl.H'y) < B,{HlHl,)+B,{Hl,Hl)+B,{Hl„Hl) - 9i2 + .S2i+ri2, 
si2 - B,iHl„H^y,) < B,iHl„H^) + B,iHlHl)+BiiHl,H^y,) = ri2+P2i + 9i2, 
ri2 - B,{Hl„Hl) < B,{Hl„Hl)+B,{Hl„Hl)+B,[HlHl) = s,2 + q2i+Pi2. 

(4) 
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Consider also the inequalities for the order-distance D2 obtained by putting 
• = n = l,o = U = 2, and identifying ^ with <: 

pii = D2{HlH^) < B2{Hl,Hl.)+B2{Hl,Hl,)+^2{Hl,,Hl) = qn+S22+ni, 
sii = ^2{Hl„Hl,) < jy2{Hl,,Hl) + ^2{Hl,Hl)+^2{Hl,Hl) - rn+P22 + gii, 
rn = J^2{Hl„H^) < J^2{H',„H^,)+Wy',Hl)+'D2{HlH'y) - sn + q22+Pii. 

(5) 

Theorem 3.3. Each right-hand Bell-CHSH-Fine inequality is equivalent to the 
corresponding chain inequality in for the order- distance Di. Each left-hand 
Bell-CHSH-Fine inequality is equivalent to the corresponding chain inequality 
in for the order- distance D2. 

Proof. We show the proof for the first of the Bell-CHSH-Fine double-inequalities. 
The equivalence of □ 

Pii + 7-11 + sii - qii - a[. - 6.1 < 

to 

912 < P12 + f 21 + S12 

obtains by using the identities 

912 = ai. - 911, 
P12 = ai. - pii, 
r2i = b.i - rn, 
S12 = a'l. - sii. 

The equivalence of 

Pii + rn -\- sii - qii - a[, - b.i > -1 

to 

911 < Pii + ''22 + sii 

follows from the identity 

^^22 = 1 + rn - a[, - b.i. 

4 Concluding remarks 

The order-distances are versatile and have a broad sphere of applicability be- 
cause order relations on the domains of any given set of random variables can 
always be defined in many different ways. If no other structure is available, this 
can always be done by the partitioning of the domains mentioned in Section [T] 
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and used in the example with bivariate normal distributions in Section [5] as well 
as for the binary variables of the previous section: = Ufe=i vS'^^ € E, 



w G ri, putting a ^ & if and only if a G Va''\b G V^'''' and k < I. Due to its 
universality and convenience of use, it deserves a special name, classification 
distance. 

Under additional constraints one can suggest many other p. q. -metrics on 
sets of jointly distributed random variables. Thus, if the variables in H are 
real-valued with the conventional Borel sigma algebras, one can define, for any 

fv^r™ fo,i<p<o<,. 

I ess sup 1^ — i}| lor p = oo. 



where 



esssup|v4 - B\= mi{v : Pr[\A ~ B\ < v] ^ 1} . 



These p.q.-metrics are conventional metrics. In the context of selective influ- 
ences these metrics have been introduced in Kujala & Dzhafarov (2008) and 
further analyzed in Dzhafarov & Kujala (2010). An important property of S-P^ 
is that the result of a d^^^ -based distance- type test is not invariant with respect to 
input-value-specific transformations of the random variables A^, G <&, A G A. 
This means that the test can be performed on a potential infinity of sets of 

random variables = F (^xx,A^^ , with xx G ({A} x W-^) Ci (f>. 

If the jointly distributed random variables constituting the set H are discrete, 
one can use information-based p.q.-metric. Perhaps the simplest of them is 

h{A\B) = -J2PAB{a,b)\og^^^^, A,BeH, (7) 

with the conventions Olog ^ — OlogO = 0. is This function is called conditional 
entropy. The identity h {A\A) = is obvious, and the triangle inequality, 

h{A\B) < h{A\C) + h{C\B) , 

follows from the standard information theory (in)equalities, 

h{A\B) <h{A,C\B), 



h {A, C\B) = h {A\C, B)+h {C\B) , 

and 

h{A\C,B) < h{A\C) . 

Note that, unlike with the distance d'^^'^ above, the test of selectiveness based 
on h {A, B) (and other information-based distances) is invariant with respect 
to all bijective transformations of the variables. The additively symmetrized 
(i.e., pseudometric) version of this p.q.-metric, h {A\B) + h {B\A) is well-known 
(Cover & Thomas, 1990). 
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There are numerous ways of creating new p.q.-metrics from the ones aheady 
constructed, including those taken from outside probabihstic context. Thus, if 
d is a p.q.-metric on a set S, then, for any set H of jointly distributed random 
variables taking their values in S, 

D{A,B) ='E[d{A,B)], A,BeH, 

is a p.q.-metric on H. This follows from the fact that expectation E preserves 
inequalities and equalities identically satisfied for all possible realizations of the 
arguments. Thus, the distance d*-^-* {A, B) = E [\A — B\] trivially obtains from 
the metric \a — b\ on reals. In the same way one obtains the well-known Frechet 
distance 

FiA,B)=E\ l-^-^l , 
^ ' [l + \A^B\ 

Below we present an incomplete list of transformations which, given a p.q.- 
metric (quasimetric, pseudometric, conventional metric) d on a space H of 
jointly distributed random variables produces a new p.q.-metric (respectively, 
quasimetric, pseudometric, or conventional metric) on the same space. The 
proofs are trivial or well-known. The arrows should be read "can be trans- 
formed into." 

1. d d"^ {q < !)• In this way, for example, we can obtain metrics 



I (ess sup \A 



P])"^" for 1 < p < oo. 



B\) for p ^ oo 

from the metrics d'^^'^ defined in ([5]). 

2. d =^ d/ (1 -|- d), a standard way of creating a bounded p.q.-metric. 

3. di,d2 =^ max {di, d2} or di, d2 di + d2. This transformations can be 
used to symmetrize p.q.-metrics, d {A, B)+d {B, A) or max{d {A, B) , d (_B, A)} 
(although this is never useful when using chain inequalities as necessary 
conditions: any violation of a chain inequality with the symmetrized quan- 
tities implies a violation of this inequality by the original p.q.-metric, but 
not vice versa). 

4. A generalization of the previous: {d^, : v £ T} sup {d„} and {d„ : v g T} = 
E [djj], where {d„ : v G T} is a family of p.q.-metrics, and U designates a 
random variable with a probability measure m, so that 



d (A, B)= [ d„ (A, B) d 



m {v) . 



To illustrate the latter way of constructing p.q.-metrics, consider a classification 
distance with binary partitions: the domain VL, of every iJ^^ in H is partitioned 
into two (measurable) subsets, W^^.l and W^^.l. Making these partitions random. 
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i.e., allowing the index v to randomly vary in any way whatever, we get a new 
p.q.-metric. In the special case when all random variables in H take their values 
in the set of real numbers, and Wu]l is defined by z < v {z € C M., v gR), 
the randomization of the partitions reduces to that of the separation point v. 
The p.q.-metric then becomes 

ds (A, B) ^Pt[A<U < B] 

where U is some random variable. An additively symmetrized (i.e., pseudomet- 
ric) version of this p.q.-metric, ds {A, B) + ds {B, A), was introduced in Taylor 
(1984, 1985) under the name "separation (pseudo)metric," and shown to be 
a conventional metric if U is chosen stochastically independent of all random 
variables in H. 
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ORDER-DISTANCE AND OTHER METRIC-LIKE FUNCTIONS ON JOINTLY 
DISTRIBUTED RANDOM VARIABLES 

EHTIBAR N. DZHAFAROV AND JANNE V. KUJALA 

Abstract. We construct a class of real-valued nonnegative binary functions on a set of jointly 
distributed random variables, which satisfy the triangle inequality and vanish at identical argu- 
ments (pseudo-quasi- metrics) . We apply these functions to the problem of selective probabilistic 
causality encountered in behavioral sciences and in quantum physics. The problem reduces to 
that of ascertaining the existence of a joint distribution for a set of variables with known dis- 
tributions of certain subsets of this set. Any violation of the triangle inequality by one of our 
functions when applied to such a set rules out the existence of the joint distribution. We focus on 
an especially versatile and widely applicable class of pseudo-quasi-metrics called order-distances. 
We show, in particular, that the Bell-CHSH-Fine inequalties of quantum physics follow from the 
triangle inequalities for appropriately defined order-distances. 



We show how certain metric-like functions on jointly distributed random variables {pseudo-quasi- 
metrics introduced in Section [IJ can be used in dealing with the problem of selective probabilistic 
causality (introduced in Section [J), illustrating this on examples taken from behavioral sciences 
and quantum physics (Section [3]). Although most of Section [5] applies to arbitrary pseudo-quasi- 
metrics on jointly distributed random variables, we single out one, termed order- distance, which is 
especially useful due to its versatility. We discuss examples of other pseudo-quasi-metrics and rules 
for their construction in Sectional 

1. Order p. q. -metrics 

Random variables in this paper are understood in the broadest sense, as measurable functions 
X : Vs —?■ V, no restrictions being imposed on the sample spaces {Vs,T,s, Hs) and the induced 
probability spaces, (y,E,/i), with the usual meaning of the terms (sets of values Vs,V, sigma- 
algebras E^,!], and probability measures fis,!^)- In particular, any set X of jointly distributed 
random variables (functions on the same sample space) is a random variable, and its induced 
probability space (or, simply, distribution) X — {V, is referred to as the joint distribution of 

its elements. 

Given a class of random variables not necessarily jointly distributed, let ^* be the class 
of distributions X for all X e For any class function /* : ^* — )■ R (reals), the function 

/ : — > M defined by / {X) — f* (X) is called observable (as it does not depend on sample spaces, 
typically unobservable) . We will conveniently confuse / and /* for observable functions, so that if 
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/ is defined on J", then / (F), identified with /* (F) , is also defined for any F ^ JT with F e JT*. 
(This convention is used in Section[2l when we apply a function defined on a set of random variables 
H to different but identically distributed sets of A-variables.) 

For an arbitrary nonempty set 17, let H = {Huj : w G 17} be a indexed set of jointly distributed 
random variables H^^ with distributions i/^ — (V^, E;^, /i^). For any a,/3 £ 17, the ordered pair 
{Ha, Hp) is a random variable with distribution (V^ x Vg, Sq, x Hp, ij^^^p), and H x H is a. set of 
jointly distributed random variables (hence also a random variable) . 

Definition 1.1. We call an observable function d : H x H R a. pseudo-quasi-metric (p.q.-metric) 
on H if, for all a, /3, 7 G 17, 

(i) d{Ha,Hp)>Q, 

(ii) d{Ha,Ha)=Q, 

(iii) d{Ha,H^) <d{Hc.,Hp)+d{Hp,H^). 

For terminological clarity, the conventional pseudometrics (also called semimetrics) obtain by 
adding the property d {Hq,, Hp) = d {Hp, Ha); the conventional quasimetrics are obtained by adding 
the property a ^ fi ^ d{Ha, Hp) > 0. A conventional metric is both a pseudometric and a 
quasimetric. (See, e.g., [27] for discussion of a variety of metrics and pseudometrics on random 
variables.) 

By obvious argument we can generalize the triangle inequality, (iii) : for any H^^ , ■ ■ ■ , H^i G H 

a>3), 

I 

(1.1) d{Ha„Ha,)<J2d{Ha,^^,Ha.)- 

We refer to this inequality (which plays a central role in this paper) as the chain inequality. 
Let 

and we write a ^ 6 to designate (a, b) G R. Let i? be a total order, that is, transitive, reflexive, and 
connected in the sense that for any (a, b) G 1J(^ 0)gaxO ^ ^/3' least one of the relations a <h 
and b ^ a holds. We define the equivalence a ^ b and strict order a ~< b induced by ^ in the usual 
way. Finally, we assume that for any (a, /3) G 17 x 17, the sets 

{{a,b) : a eVa,b eVp,a ^b} 

are /iQ,^-measurable. This implies the /iQ,^-measurability of the sets 

{{a,b) : a eVa,b eVp,a b} , {{a,b) : a G Vq, 6 G Vs, a ~ 6} . 

Thus, if all are intervals of reals, ^ can be chosen to coincide with <, and (assuming the 
usual Borel sigma algebra) all the properties above are satisfied. Another example: for arbitrary 
VL,, provided each contains at least n > 1 disjoint nonempty sets, one can partition T4; as 
Ufc=i with Vu''^ G Stj, and put a ^ 6 if and only if a G Va''\b G Vp^ and k < I. Again, all 

properties above are clearly satisfied. 

Definition 1.2. The function 

D {Ha, Hp) = Pr [Ha -<Hp\^ f dfia,p (a, b) 
is called an order p.q.-metric, or order- distance, on H. 
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That the definition is well-constructed follows from 
Theorem 1.3. Order-distance T) is a p.q.-metric on H. 

Proof. Let a,/3,^ G fl, and Ha = A, = B, and Hj = X. That D {A, B) is determined by the 
distribution of (A, B) is obvious from the definition. The properties D [A, B) >0 and D {A, ^) = 
are obvious too. To prove the triangle inequality, 

D {A, B) = Pt[A-< B]=Pt[A^ B -< X] + Ft[A ^ B X] 
+ Fv[A-<X -<B]+Pv[A'^X ^B]+Pr[X ^A^ B], 

D {A, X) = Pt[A-< X]=Pv[A^ X ^ B]+Pt[A~< B X] 
+ Pv[A^ B ~< X]+P-r[A'^ B ~< X]+Pt[B ~< A-< X], 

D {X, B) =Pt:[X B]=Pt[X ^ B ^ A]+Pt[X ~< A-- B] 
+ Pt[X ^A^B]+PT[Ar.X ~<B]+Pt[A~<X ~<B]. 

So 

D {A, X) + D {X, B) -D{A,B) = Pi[B ^ A^ X] + Pr[A'^ B ^ X] 

+ Pv[X < B ^ A]+Pt:[X ~< Ar^ B] + Pv[A~< X ~< B]>0. 

□ 

Since in the last expression all events are pairwise exclusive, we have 

D (A, X) + D {X, B)-D (A, B) < 1. 

This may seem an attractive addition to the triangle inequality. The inequality is redundant, how- 
ever, as it is subsumed by the triangle inequalities holding on {A, B, X}. Rewriting the expression 
above as 

D{A,B) + 1-D {X, B)-D {A, X) > 0, 

it immediately follows from 

D {A, B) + D {B, X) - D {A, X)>0 

and 

B{B,X) = Pt[B -< X]<1-Pt[X ~< B] = 1-'D{X,B). 

2. Selective probabilistic causality 

Consider an indexed set W = {W^ : A G A}, with each being a set referred to as a (de- 
terministic) input, with the elements of {A} x called input points. Input points therefore are 
pairs of the form x = {X,w), with w e W^, and should not be confused with input values w. A 
nonempty set $ C Oaga called a set of (allowable) treatments. A treatment therefore is a 

function : A — >■ (JagA such that (A) e for any A € A. Note that symbol 4> not followed 
by an argument always refers to the entire function, the set {(A, cj) (A) : A e A)}. 

In the following we use two kinds of random variables: those indexed as A^ (each corresponding to 
a fixed index A e A and a fixed function (^) and those indexed as (with w G W^), corresponding 
to input points (A,w). 

Let there be a collection of sets of random variables, referred to as (random) outputs, 

= {4 : A G A} , <^ e 
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such that the distribution of (i.e., the joint distribution of aU in A^) is known for every 
treatment (p. We define 

= {4 : e , A e A, 

with the understanding that is not a random variable (i.e., for different </> are not jointly 
distributed). To illustrate the notation, let A = {1, 2, . . .} and W"^ be the set of reals for all A G A. A 
treatment then is a real- valued function (sequence) {(1, (1)) , (2, (j) (2)) ,...} — {(j) (1) , (j) (2) ,.. .), 
where (j) (1) G W^, (p (2) £ W'^, etc. Let $ be a nonempty set of such sequences. Fixing one of them, 

(j)= {W1,W2,.-.), 

fixing, say, A = 2 and allowing {wi, u'2, . . .) range over $, 

A^^A^ = : iwi,W2, ...)£$}. 

The following problem is encountered in a wide variety of contexts [6l El [15] . We say that the 
dependence of random outputs A^ on the deterministic inputs W^^ is (canonically) selective if, for 
any distinct A, A' G A and any the output is "not influenced" by (A'). The question is 

how one should define this selectivity of "influences" rigorously, and how one can determine whether 
this selectivity holds. This problem was introduced to behavioral sciences by Sternberg [18] and 
Townsend [22]. In quantum physics, using different terminology, it was introduced by Bell [3] and 
elaborated by Fine [101 [TT] . The definition can be given in several equivalent forms, of which we 
present the one focal for the present context. 

Definition 2.1. The dependence of outputs {A"^ : A G A} on inputs {W^^ : A G A} (or the "influ- 
ence" of the latter on the former) is (canonically) selective if there is a set of jointly distributed 
random variables 

H = {H^ : w G W^^,A G A} 
(one random variable for every value of every input), such that, for any treatment G 

— Acj), 

where 

= : a G a} 

and 

= {A^ : A G A} 

(the corresponding elements of and A^ being those sharing the same A). 

This deflnition is known as the Joint Distribution Criterion (JDC) for selectivity of influences, 
and the set H satisfying this deflnition is referred to as a (hypothetical) JDC-set. Specialized forms 
of this criterion in quantum physics can be found in [19j and ^lOl lll| ; in the behavioral context and 
in complete generality this criterion is given (derived from an equivalent definition) in [8]. 

Remark 2.2. The adjective "canonical" in the definition refers to the one-to-one correspondence 
between and A^ sharing the same A. A seemingly more general scheme, in which different 
A'^ are selectively influenced by different (possibly overlapping) subsets of \W'^ : A G A} is always 
reducible to the canonical form by considering, for every A^, the Cartesian product of the inputs 
influencing it a single input, and redeflning correspondingly the sets of input points and the set of 
allowable treatments. 
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The simplest consequence of JDC is that the selectivity of influences implies marginal selectiv- 
ity [6l m], defined as follows. For any A' C A we can uniquely present any G $ as 0' U 0', 
where (p' G IlAeA' ^^^"^ ^ HagA-A' Then, if JDC is satisfied, the joint distribution of 
: a £ A'I does not depend on (j)' . 

Remark 2.3. In the following we always assume that marginal selectivity is satisfied. 

The relevance of the order-distance and other p.q.-metrics on the sets of jointly distributed 
random variables to the problem of selectivity lies in the general test (necessary condition) for 
selectivity of influences, formulated after the following definition. 

Definition 2.4. We call a sequence of input points 

xi = (ai, wi) , . . . , X; = {ai,wi) 

(where Wi e W"^ for i = 1, . . . ,1 > 3) treatment-realizable if there are treatments 0^, . . . , 0' G $ 
(not necessarily pairwise distinct), such that 

{xi, xi} C (p^ and {xi-i,Xi} C 0' for i = 2, . . . , Z. 

If a JDC-set H exists, then for any p. q. -metric d on H we should have 

d{H:iH:,l)=d[AllA;'' 



and 



for i ~ 2, . . . ,1 whence 

(2.1) d(^-,A-)<5:d(^--,A-) 



i=2 



This chain inequality, written entirely in terms of observable probabilities, is referred to as a p.q.- 
metric test for selectivity of influences. If this inequality is violated for at least one treatment- 
realizable sequence of input points, no JDC-set H exists, and the selectivity is ruled out. Note: 
if the sequence . . . , G 'I' for a given xi, . . . ,xi can be chosen in more than one way, the 
observable quantities d , ^^^'i)^ and d ^j4^(7_\) , A^j^,^ remain invariant due to the (tacitly 

assumed) marginal selectivity. 

As an example, let A = {1,2}, VV^ = [0, 1], = [0, 1], $ = x W^. Let {^J,, A^} for any 
treatment (p have a bivariate normal distribution with zero means, unit variances, and correlation 
p = min (1, wi + W2), where wi — (j) (1) , W2 — 4> (2). Marginal selectivity is trivially satisfied. Do 
{M^^;Ty^} influence {j4^,A^} selectively? For any bivariate normally distributed (^,5), let us 
define A ^ B \E A <Q^B >Q. Then the corresponding order-distance on the hypothetical JDC-set 
H is 

r./'tj-i tr2 \ arccos(min(l,-u;i + W2)) 
U [H^^^,H^J - — . 

The sequence of input points (1,0) , (2, 1) , (1, 1) , (2,0) is treatment-realizable, so if H exists, we 
should have 

D (i?o\ i?o) < D {H^,Hf) + D {Hi Hi) + D {hI,H^) . 



6 



EHTIBAR N. DZHAFAROV AND JANNE V. KUJALA 



The numerical substitutions yield, however, 

i < + + 0, 

and as this is false, the hypothesis that influence {^^, A^} selectively is rejected. 

The theorem below and its corollary show that one only needs to check the chain inequality for 
a special subset of all possible treatment- realizable sequences xi, . . . ,xi. 

Definition 2.5. A treatment-realizable sequence xi, . . . ,Xi is called irreducible if Xi ^ xi and the 
only subsequences {a;ij Xij. } with fc > 1 that are subsets of treatments are pairs {xi, cc;} and {xi- 
for i — 2^ . . . ,1. Otherwise the sequence is reducible. 

Theorem 2.6. Given a p. q. -metric d on the hypothetical JDC-set H, inequality 112.1]} is satisfied 
for all treatment-realizable sequences if and only if this inequality holds for all irreducible sequences. 

Proof. We prove this theorem by showing that if (|2.1I) is violated for some reducible sequence 
xi,...,xi, then it is violated for some proper subsequence thereof. Clearly, xi ^ xi because 
otherwise p.ip is not violated. For I = 3, xi,X2,X3 is reducible only if it is contained in a treatment: 
but then p.ip would be satisfied. So I > 3, and the reducibility of xi, . . . , xi means that there is a 
pair {xp.,Xq} belonging to a treatment, with (p, ^ (1,0 ^^"^ q > p + 1. But then (j2.1l) must be 
violated for either Xp, . . . , Xg or xi, . . . , Xp, Xg, . . . , x; (allowing for p = 1 or q = I but not both). □ 

If $ = Hag A (^^^ logically possible treatments are allowable), then any subsequence Xi^ , • ■ • , 
of input points with pairwise distinct a^^, . . . ,aij. belongs to some treatment. Therefore an irre- 
ducible sequence cannot contain points of more than two inputs, and it is easy to see that then it 
must be a sequence of pairwise distinct xi e {a}xW°',X2 G {/?} x X2m-i £ {a} x VF", X2m S 

1/3} X {a ^ f3). It is also easy to see that if m > 2, each of the subsets {xi,X4} and {x2,X5} 
will belong to a treatment. Hence m = 2 is the only possibility for an irreducible sequence. 

Corollary 2.7. If ^ — IlAeA^^' then inequality \2.1]) is satisfied for all treatment-realizable 
sequences if and only if this inequality holds for all tetradic sequences of the form x,y,s,t, with 
X, s e {a} X VK", y, t 6 {/3} x x ^ s, y t, a ^ l3. 

Remark 2.8. This formulation is given in |8:, although there it is unnecessarily confined to metrics 
of a special kind. 

3. An APPLICATION 

The four tables below represent results of an experiment with a 2 x 2 factorial design, {x,x'} x 
{y, y'}, and two binary responses, A and B. In relation to our general notation, we have here 
A = {1, 2}, = {x, x'}, W"^ = {y, y'}, and four treatments (x, j/) , . . . , (x', y'); for every treatment 
(j), the random outputs and are represented by, respectively, and B^, each having two 
possible values, arbitrarily labeled. This design is arguably the simplest possible, and it is ubiquitous 
in science. In a psychological double-detection experiment (see, e.g., |23|). the input values may 
represent presence (x and y) or absence (x' and y') of a designated signal in two stimuli labeled 1 
and 2, presented side-by-side. The participant in such an experiment is asked to indicate whether 
the signal was present or absent in stimulus 1 and in stimulus 2. The output values A = o and 
B = n may indicate either that the response was "signal present" or that the response was correct; 
and analogously for A — • and B — Li (either "signal absent" or an incorrect response) . The entries 
Pij,qij, etc. represent joint probabilities of the corresponding outcomes, ai.,a'^,, etc. represent 
marginal probabilities. The question to be answered is: does the response to a given stimulus (A 
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to 1 and B to 2) selectively depend on that stimulus alone (despite A and B being stochastically 
dependent for every treatment), or is A or i? influenced by both 1 and 2? 
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Another important situation in which we encounter formally the same problem is the Einstein- 
Podolsky-Rosen (EPR) paradigm. Two particles are emitted from a common source in such a way 
that they remain entangled (have highly correlated properties, such as momenta or spins) as they 
run away from each other p1ll6). An experiment may consist, e.g., in measuring the spin of electron 
1 along one of two axes, a; or a;', and (in another location but simultaneously in some inertial frame 
of reference) measuring the spin of electron 2 along one of two axes, y or y' . The outcome A of 
a measurement on electron 1 is a random variable with two possible values, "up" or "down," and 
the same holds for B, the outcome of a measurement on electron 2. The question here is: do the 
measurements on electrons 1 and 2 selectively affect, respectively, A and B (even though generally A 
and B are not independent at any of the four combinations of spin axes)? If the answer is negative, 
then the measurement of one electron affects the outcome of the measurement of another electron 
even though no signal can be exchanged between two distant events that are simultaneous in some 
frame of reference. What makes this situation formally identical to the double-detection example 
described above is that the measurements performed along different axes on the same particle, x 
and x' or y and y', are non-commuting, i.e., they cannot be performed simultaneously. This makes 
it possible to consider such measurements as mutually exclusive values of an input. 

Theorem 3.1. (Fine [TOlITT]; A JDC-set H ^ {H]^, H^,, H^, H^,} satisfying 

JHIHI) = {Axy, Bxy}, {HI,HI } = {Axy-, Bxy-}, 

{Hl„Hl]^{Ax-y,Bx-y}, [Hl.,Hl,]={Ax-y-,Bx-y-} 

exists if and only if the following eight inequalities are satisfied: 

-1 < Pll + rii + sii - qii - a'^. - b.i < 0, 
/q -1 < 911 + -511 + ''11 - Pll - a[. - b'l < 0, 

-1 < m + Pll + qii - sii - ai. - b.i < 0, 
-1 < Sii + qu +P11 - rii - ai. - b'.^ < 0. 

We refer to (|3.ip as Bell-CHSH-Fine inequalities, where CHSH abbreviates Clauser, Horne, 
Shimony, & Holt H]: in this work Bell's j3| approach was developed into a special version of (|3.ip . 

Remark 3.2. The proof given in |101lll| that (|3.ip is both necessary and sufficient (under marginal 
selectivity) for the existence of a JDC-set can be conceptually simplified: the Bell-CHSH-Fine 
inequalities can be algebraically shown to be the criterion for the existence of a vector Q with 16 
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probabilities 

Pr [Hi - .,Hl, = .,Hl = U,Hl = u],..., 

Pr [Hl^o,Hl, = o,Hl^n,Hl^r\] 

that sum to one and whose appropriately chosen partial sums yield the 8 observable probabilities 

Pii,<?ii,?'ii,sii,ai.,6.i,a'i.,6'i 

(other probabilities being determined due to marginal selectivity). This is a simple linear pro- 
gramming task, and the Bell-CHSH-Fine inequalities can be derived "mechanically" by a facet 
enumeration algorithm (see [251 [26] and [2]). For extensions of the Bell-CHSH-Fine inequalities to 
multiple particles, multiple spin axes, and multiple random outputs, see P] and [17]. For modern 
accounts of mathematical and interpretational aspects of the entanglement problem in quantum 
physics, see [H [131 [14]. 

The point of interest in the present context is that the Bell-CHSH-Fine inequalities, whose rather 
obscure structure does not seem to fit their fundamental importance, turn out to be interpretable 
as the triangle inequalities for appropriately chosen order-distances. 

Consider the chain inequalities for the order-distance Di obtained by putting • = U = 1, o = 
n = 2, and identifying < with <: 

912 = Di(i7i,ij2) < Di(iJi,7j2) + Di(i72,iJi,)+Di(^i',ff^') =Pi2+r2i + si2, 

P12 = Di(ijl,i?2) < Di(i?l,ff2)+Di(ij2^ijl,)+Di(iji„ij2) ^ gi2 + S21+ri2, 

si2 = ^i{Hl„Hl) < B,iHl„H^) + B,{HlHl)+B,{Hl,H^,) = n2+P2i + qi2, 

ri2 = Di(ffl„i/2) < Di(i/l„ff2,)+Di(i/2„i/l)+Di(i/l,i/2) = 512+921+^12. 

Consider also the inequalities for the order-distance D2 obtained by putting • = □ = o — U — 2, 
and identifying ^ with <: 

qn = Wl,H'y') < D2(i/i,if2) + D2(i/,2,i/i,)+D2(i/i„i/,2,) -Pll+r22 + .Sll, 

Pll - WlH'y) < B2{Hl,H^,)+B2{H^„Hl,)+B2iHl„H^) = qn+S22+rn, 

Sll - B2{Hl,,H^,) < B2{Hl,,H^)+B2iHlHl)+B2iHlH^,) = rn+P22 + qii, 

m = ^2{Hl„Hl) < B2{Hl„Hl) + B2{Hl,Hl)+J^2{HlHl) = sn+q22+Pii. 

Theorem 3.3. Each right-hand Bell-CHSH-Fine inequality is equivalent to the corresponding chain 
inequality in iS. 2\) for the order- distance Di . Each left-hand Bell-CHSH-Fine inequality is equivalent 
to the corresponding chain inequality in llci.3\) for the order- distance D2. 

Proof. We show the proof for the first of the Bell-CHSH-Fine double-inequalities. The equivalence 
of 

Pll + rii + Sll - 911 - a'l. ~b.i <0 

to 

912 < P12 + ?'2i + S12 

obtains by using the identities 

912 = fli- - 911, 
P12 = fli- -Pii, 
r2i = b.i - rii. 
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The equivalence of 

Pii + rii + sii - qii - a[. - b.i > -1 

to 

gil < Pll + 7-22 + Sll 

follows from the identity 

^22 = 1 + 7-11 - a[. - b.i. 

□ 

4. Concluding remarks 

The order-distances are versatile and have a broad sphere of applicability because order relations 
on the domains of any given set of random variables can always be defined in many different ways. If 
no other structure is available, this can always be done by the partitioning of the domains mentioned 
in Section [T] and used in the example with bivariate normal distributions in Section [5] as well as for 
the binary variables of the previous section: Voj — Ufe=i G S^, a; G il, putting a ^ b ii 

(k) (I) 

and only if a £ ,b € and k < I. Due to its universality and convenience of use, it deserves 
a special name, classification distance. 

There are numerous ways of creating new p.q.-metrics from the ones already constructed, in- 
cluding those taken from outside probabilistic context. Thus, if d is a p. q. -metric on a set 5, then, 
for any set H of jointly distributed random variables taking their values in 5, 

D{A,B) ^E[d{A,B)], A,BeH, 

is a p. q. -metric on H. This follows from the fact that expectation E preserves inequalities and 
equalities identically satisfied for all possible realizations of the arguments. Another example: 
given any family of p.q.-metrics {d^ : v G T}, their average with respect to a random variable U 
with a probability measure m, 

d {A, B)= I d„ (A, B) dm (v) , 

is a p. q. -metric. As a special case, consider a classification distance with binary partitions: the 
domain of every in H is partitioned into two (measurable) subsets, w!^}j and w!^}j. Making 
these partitions random, i.e., allowing the index v to randomly vary in any way whatever, we get 
a new p.q.-metric. In the special case when all random variables in H take their values in the set 
of real numbers, and wi^2 is defined by z < u (z £ Kj C M, u GK), the randomization of the 
partitions reduces to that of the separation point v. The p.q.-metric then becomes 

ds {A, B) ^¥i[A<U < B] 

where U is some random variable. An additively symmetrized (i.e., pseudometric) version of 
this p.q.-metric, ds{A,B) + ds{B,A), was introduced in [20l [21] under the name "separation 
(pseudo)metric," and shown to be a conventional metric if U is chosen stochastically independent 
of all random variables in H . 
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